07/03/2020

Overview

Goals

  • Learn the basics of the R programming language
  • Learn the basics of Markdown writing
  • Learn how to combine R code and markdown text to create RMarkdown documents

Outcomes

  • TBD
  • Create RMarkdown summary document of things we learned today!

Don’t worry

R & R Studio Recap

R Studio Recap

Console and script panes in RStudio


  • Console: Run code, see print outs, see warnings, messages, and errors


  • Source: Run code from a script. Multiple scripts can be open at once.

Markdown

What is “Markdown?”

R Markdown

What is R Markdown?

https://rmarkdown.rstudio.com/
  • Integrate R code directly into your writing using basic Markdown syntax
  • Reference management integration
  • Reproducibility
  • Accessible learning curve


✏️ Very useful for writing summary reports, articles, etc.

R Markdown

We’re going to make this today!

Intro to

Objects in R

Everything in R is an “object”

Variables

Variable: a symbol that stores/represents some other value or set of values.

Think of variables as containers.

Variable assignment

Variables get assigned their value in R with a left arrow <-

  • Numeric variables represent numbers

x <- 5

  • This means the value of the variable we have named x is equal to 5.

  • String variables represent “strings” of text. Text has to be enclosed in quotes ""

my_var <- “hello”

  • This means the value of the variable my_var is equal to the text “hello”.

What do I name my variables?

Best practices in naming your variables

In R, it’s pretty flexible. Not as flexible in other languages (*cough* Praat *cough*)

Some suggestions

Try your best to…

Variable names: Not okay

General rules:

Not allowed Why
1abc Starts with a number
intelligibility% Contains a special character like !@#$%^&*, etc.
.1my_var Starts with a period followed by a number (.1
_my_var Starts with an underscore (_)
special characters Some words have special meaning to R and you shouldn’t overwrite them (e.g., “mean”)

Variable names: Better

Allowed Good because Bad because
x short, lowercase Not meaningful
intelligibility_scores meaningful, easy to remember Takes a long time to type
int_mean meaningful, short 🤷
int_mean_pd, int_mean_hc meaningful, shortish, descriptive A bit long

Variable names: Choose your case

Try it!

Numeric variables

In your console, do the following:

  • Create a variable x and assign it a value of some number
  • Create a variable of y and assign it a value of some other number
  • Create some other variable (name it whatever you like!) and assign it a value of yet another number

String variables

In your console, do the following:

  • Create a variable a and assign it a value of some text (don’t forget to enclose the text in quotes!)
  • Create a variable b and assign it a value of some other text
  • Create a variable with another name and assign it yet another text value

Types of data

  1. Numeric
  • Integers (whole numbers)
  1. Strings: collection of characters
  • e.g., “abc”, “the rainbow is a division of white light”
  • numbers and other non-letter characters can also be treated as strings
  1. Factors: categorical variables. Make up a finite set
  • e.g., “Blue” “Green” “Red”
  1. Logicals: special kind of factor that only has two values
  • TRUE vs FALSE, 0 vs 1

Types of data structures

  1. Data frames: collections of vectors
  2. Vectors: collection of similar elements (numbers, characters, factors, etc..)
  3. Matrices
  4. Lists
  5. Arrays

Comments

  • In R code, any line of code preceded by a hashmark (#) is not evaluated (i.e., it’s ignored)
  • You can write notes to your future self this way
# Set x equal to 0
x <- 0
# Now add 1 to the value of x
y <- x+1

R Packages & Libraries

Packages are bundles of code written to do (typically) specific sets of functions

  • Some packages are automatically downloaded and loaded into your workspace when you install R
  • Others you have to explicitly download
  • Many packages are hosted on CRAN - this is the official “home” of peer-approved packages
    • These can be installed using the function install.packages().
install.packages("tidyverse")
  • Other packages are not hosted on CRAN - many of these are excellent, but some may be less reliable.
    • Many of these are hosted on GitHub.com
    • These usually have to be installed using a function install_github() which is part of the devtools package.
# install.packages("devtools")
devtools::install_github("hadley/emo")

For example: To install the emo package from Hadley Wickham’s Github page:

Libraries

  • Libraries in R refer to where the packages are stored. When you install a new package, it gets automatically saved to a particular location (you don’t need to specify where). When you want to use the contents/functions of a package, you need to “load the library” using the library() function.
# load the tidyverse package
# expect a bunch of output messages (this is normal)
library(tidyverse) 
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   0.8.3     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
  • If you ask R to load a package you don’t have installed, or make a typo in the package name, R will yell at you (give you an error):
# Typo! what happens? R yells at you.
library(tidverse)
## Error in library(tidverse): there is no package called 'tidverse'

Functions

Functions: A certain named format of code that outlines a procedure. Often this allows several lines of code to be executed with a single line of code (by using the name of the function)

  • In other words, functions are actions.
  • Most functions take arguments
    • What do you want to act on?

Tidyverse

“The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.” www.tidyverse.org

  • Changed the game of R coding
  • Much more intuitive syntax (IMHO); more “english-like”
  • BUT not everyone likes it. Sometimes makes things easier, sometimes makes things more complicated.
  • We’ll be using both
  • Core packages: ggplot2, dplyr, tidyr, readr, stringr…

Disclaimer

  • We will cover both base R and the Tidyverse

Getting started: R Projects

Application

Open 1_prep_data.R from today’s materials.

Rmarkdown

The goal of today’s workshop is to establish a workflow for using the rmarkdown package to “knit” together R code and text to create summary documents.

It might seem kind of odd to launch into a workflow that ties together R code and other stuff before we’ve actually had a chance to learn any R code, but here’s the rationale:

  1. 🛠 Foundational skill: R Markdown document creations can serve as a way to document the rest of our skills. At each workshop, we’ll create another R Markdown document (extension .Rmd) to log the skills we worked on and any notes you’d like to keep for yourself. In this way, this skill is foundational.
  2. 🙌 Instant gratification! R Markdown documents can be rendered without much working knowledge of R at all. It’s a lovely thing when you can get something up and running RIGHT AWAY

Markdown

Markdown refers to a set of conventions for editing plain text. With markdown syntax, you write as you normally would in a text editor or word processor, but you signal text formatting with certain characters. Markdown (which is distinguished from markUP language) is designed to be easily readable, easy to write, and easy to learn.

*italic*
**bold**
**italic and bold!***

# First level header
## Second level header
## Third level header

1. the first item on a numbered list
2. the second item on a numbered list

- the first item on a bulleted list
- the second item on a bulleted list
  - item 2a
  
Tables look like this: 

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Cell

Code chunks

More advanced

  • inline R code

Getting unstuck

  • Google your errors!
  • Google “how to use XX package in r”
  • Search Twitter (#rstats)

Resources